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TECHNICAL  REPORT  NO.  3 

THE  ACTJAL  AND  POTENTIAL  ASSOCIATION 
OF  IDEAS  IN  INFORMATION  SYSTEMS 

In  Technical  Report  Nc.  1 we  considered  the  problem  of  the 
relation  of  the  number  of  possible  associations  of  ideas  in  an  in- 
formation system  to  the  actual  of  associations. 

In  an  information  system  of  ^^rms,  the  number  of  possible 
associations  (using  only  one  mode  of  association,  namely,  logical 
conjunction.}  is  2"-I;  but  we  recognize  intuitively  that  there  is 
a much  smaller  number  of  actual  associations.  Suppose  in  a system 
of  information  using  5000  terms  to  analyze  a collection  of  docu- 
ments, no  individual  document  required  more  than  10  tc»-ms  for  a 
complete  analysis  of  its  contents.  It  wo;:]d  follow  that  any  asso- 
ciation of  11  terms  would  be  an  empty  function,  i.e.,  there  would 
be  no  information  in  the  system  corresponding  to  any  association 
of  1!  terms.  This  does  not  give  us  the  number  of  actual  associa- 
tions, but  it  sets  an  upper  limit  to  the  number  of  possible  func- 
tiic..s  rrhich  c.lfioi'Th  still  very  large  is  much  smaller  than  2'’-i, 

The  number  of  possible  associations  of  10  terms  in  a system 
cf  5000  terms  is,  of  course 

5000  X 4999  X 4998 
lOj 


- 4991 


ciO  in|3C 

The  magnitude  of  this  number  is  approximately  v ■ — « 

2 X 106 

Sin-.e  the  «’jmbsr  of  movimum  sized  associations  in  a sjstem  of  in- 


formation may  be  less  but  can  never  be  more  than  the  number  *-? 

CIO  y 

documents  in  the  system-  the  system  would  have  to  cover  ' v' 

2 X 10<> 

documents  in  order  to  exhibit  this  number  of  actual  associations 
of  10  terms,  it  not  ail  possible  associations  of  10  terms  are 
actual*  then  not  all  possible  assoeietions  of  8*  7,  etc. 
t^rms  will  be  actual. 

Heiice*  in  terms  of  the  conditions  we  have  established  so 
far  we  can  conclude: 

1.  There  will  be  no  associations  longer 
than  10  terms:  and 

2.  Not  all  associations  of  10  or  less  terms 
will  be  actual. 

We  can  establish  an  equivalence  between  the  number  of  maxi- 
mum sized  associations  and  the  number  of  documents  in  the  system 
by  assuming  that  each  document  in  the  system  is  analyzed  into  an 
afeociation  of  10  terms  and  that  each  association  cf  10  terms  is 
uniqutj  that  is  to  say*  two  items  might  be  analysed  into  9 identical 
terms*  but  the  tenth  term  will  serve  te  distinguish  the  analysis  of 
any  uoeumfnt  from  ill  the  ethers.  Thus*  if  there  are  50*000  docu- 
ments in  the  system,  there  will  be  50.0GG  different  associations  of 
10  terms.  We  can  also  calculate  the  maximum  and  minimum  nund>er  of 
different  associations  of  1 term.  2 te?m«.  3 terms,  etc,  ‘n  furh  a 
systeis. 

The  number  of  associations  of  1 term*  2 tern<c*  3 terms* 

4 terns*  etc.  in  a set  of  10  terras  is  again  2"-i  or  1023.  This 
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■esns  that  a single  {(oeutnent  represented  by  en  association  of  10 
terns  represents  1023  different  sets  of  associations^  If  each 
set,  of  8£80ciationf  were  unique,  th^  number  of  associations  in 
the  systcG  would  then  be  50,000  x 1023,  But,  we  know  from  uur 
conditions  that  this  isn't  true,  since,  if  there  are  c;t  ly  5, 000 
terns  in  the  system,  some  of  the  stsociations  in  one  set  will  be 
aqniralent  to  associations  in  other  sets*  Consider,  for  example, 
the  following  sets  of  associations: 

ABCDEFGKIJ 

ABCDEFGHIK 

Each  set  represents  2*^-1  or  1023  associations,  but  2^-1  associa- 
tions will  be  coimion  to  the  two  sets*  Hence,  the  number  of  differ 
ent  associations  ir  these  two  sets  is  equal  to  (2^®-!)'*' 

The  condition  under  which  we  will  have  th«  minimum  number 
of  associations  is,  of  course,  the  condition  of  the  maximum  degree 
of  common  terms  in  each  combination  of  ten  termso  If  we  varied 
our  initial  conditions  and  supposed  that  there  were  50,009  terms 
in  our  system  'ised  to  snaiyze  50,000  documents,  then  the  formula 
for  the  minimum  number  of  associations  would  be 

1023+  ^9,999  X (2^°-5^27 

This  formula  is  for  the  condition  in  which  all  50,000 
doeuRtents  have  9 terms  common  and  differ  by  only  one  term  and 
requires,  as  we  have  scid,  50,009  terms.  With  only  5000  terms  we 
can  have  only  4991  documents  having  9 terms  in  common;  for  these 
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4991  doeuinents  ths  number  of  associations  will  be 

;C23  * /4990  X (2*®  - 2’27 

5000 

But  It  ii  possible  in  a system  of  50C0  terms  to  have  9 abso- 
lutely unique  coo^inatlens  of  9 terms.  Each  of  these  unique 
sets  of  9 terms  ean  be  varied  4991  times  by  the  addition  of  an 
additional  texm.  To  provide  for  our  50. uOv  documents  with  a mini- 
mum number  of  associations,  we  need  utilise  only  10  sets,  each  of 
which  includes  4991  combinations  differlug  by  one  term.  Our  formula 
then  is 

10  s £1023  + ij990  X (2^-  - 2^)7! 

This  nu8d>er  Is  roughly  25.000.000.  which  is  the  minimum  number  of 

associations  in  a system  of  5G.GGG  documents  and  5.000  terms, 

each  document  is  uniquely  indexed  and  all  documents  are  indexed  by 

10  termss  We  need  not  attempt  to  calculate  the  maximum,  since  we 

*0 

hnow  that  it  will  be  less  thsn  50,000,000  or  j^O.OOO  x (2*  - U/ 
and  is  of  the  same  order  cf  magnitude  as  the  minimum. 

magnitude  of  the  minimum  is  significant  because  it  in- 
dicates that  mechanization  cannot  take  the  form  of  recording  and 
searching  for  each  one  of  the  25.000.000  associations  as  individual 
entities  on  a punched  card  or  even  a magnetic  drum. 

It  will  be  recalled  that  we  began  this  investigation  with 
the  realization  that  If  we  mechanized  the  manipulation  of  terms 
rather  than  documents,  the  number  of  elements  to  he  handled  by 
our  machine  is  appreciably  reduced.  New  we  see  that  our  terms 
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T«%s  »r  aiaraing  aumber  of  associatfonti  aad  the  piobU<« 
»m  9'’  ieelvding  all  associations  virtcalljr  pctcnt\a:iy 
'stuall]r»  In  f*«i«  nhat  should  b<t  ■echottieei  in  the 
•ai  'i^fCionary  is  the  presentation  of  tmy  asioela?icn  ih 
t :roK^h  the  manipulation  of  a much  smt^Her  nnmber  of 
rather  then  the  actual  storage  and  sianipntaYien  of  oU 
' ' if^ilcatiORS  and  elaboiitioe  of  this  eo3i»« 

- • ?n  ha  presented  ia  subseqaent  reports. 


